Improving Minority Class Prediction Using Case-Speci c Feature Weights
نویسندگان
چکیده
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speci c feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using pathspeci c information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signi cantly increase the accuracy of minority class predictions while maintaining or improving overall classi cation accuracy.
منابع مشابه
Improving Minority Class Prediction Using Case-Specific Feature Weights
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an information-gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibi...
متن کاملA Minimum Risk Metric for Nearest Neighbor Classification
nale. Retrieval in a prototype-based case library: A case study in diabetes therapy revision. CH97] C. Cardie and N. Howe. Improving minority class prediction using case-speciic feature weight. CS93] Scott Cost and Steven Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. DP97] Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian clas-si...
متن کاملSMOTEBoost: Improving Prediction of the Minority Class in Boosting
Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is spe...
متن کاملImproving the Quality of Minority Class Identification in Dialog Act Tagging
We present a method of improving the performance of dialog act tagging in identifying minority classes by using per-class feature optimization and a method of choosing the class based not on confidence, but on a cascade of classifiers. We show that it gives a minority class F-measure error reduction of 22.8%, while also reducing the error for other classes and the overall error by about 10%.
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کامل